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Abstract. Science and engineering problems frequently require solving a sequence of dual linear 
systems. Besides having to store only few Lanczos vectors, using the BiConjugate Gradient method 
(BiCG) to solve dual linear systems has advantages for specific applications. For example, using 
BiCG to solve the dual linear systems arising in interpolatory model reduction provides a backward 
error formulation in the model reduction framework. Using BiCG to evaluate bilinear forms - for 
example, in quantum Monte Carlo (QMC) methods for electronic structure calculations — leads to a 
quadratic error bound. Since our focus is on sequences of dual linear systems, we introduce recycling 
BiCG, a BiCG method that recycles two Krylov subspaces from one pair of dual linear systems to 
the next pair. The derivation of recycling BiCG also builds the foundation for developing recycling 
variants of other bi-Lanczos based methods, such as CCS, BiCGSTAB, QMR, and TFQMR. 

We develop an augmented bi-Lanczos algorithm and a modified two-term recurrence to include 
recycling in the iteration. The recycle spaces are approximate left and right invariant subspaces 
corresponding to the eigenvalues closest to the origin. These recycle spaces are found by solving a 
small generalized eigenvalue problem alongside the dual linear systems being solved in the sequence. 

We test our algorithm in two application areas. First, we solve a discretized partial differential 
equation (PDE) of convection-diffusion type. Such a problem provides well-known test cases that 
are easy to test and analyze further. Second, we use recycling BiCG in the Iterative Rational Krylov 
Algorithm (IRKA) for interpolatory model reduction. IRKA requires solving sequences of slowly 
changing dual linear systems. We analyze the generated recycle spaces and show up to 70% savings 
in iterations. For our model reduction test problem, we show that solving the problem without 
recycling leads to (about) a 50% increase in runtime. 

Key words. Krylov subspace recycling, deflation, bi-Lanczos method, Petrov-Galerkin formu- 
lation, BiCG, model reduction, rational Krylov, H2 approximation. 
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1. Introduction. We focus on solving the sequence of dual linear systems, 

A U) x U) =b U) j A^*x U) =U j) , (1.1) 

where A^> e C" x " and b^\lP' G C™ vary with j, the matrices are large and 
sparse, the solution of the dual system is relevant, and the change from a pair of 
systems to the next is small. 

In several application areas, there are important advantages to solving dual linear 
systems using the BiCG algorithm [20] . BiCG has a short-term recurrence, so very 
few Lanczos vectors have to be stored. In addition, using BiCG to solve the dual linear 
systems arising in interpolatory model reduction provides a backward stable method 
(with respect to the interpolation conditions) for computing a reduced order model |12) 
(see Section 5.2). This makes BiCG attractive even for symmetric positive definite 
(SPD) systems. Furthermore, in several applications, such as QMC algorithms [5], 
we need to evaluate bilinear forms of the type where u,w G C™ and A is 

non-Hermitian. Solving dual linear systems for u and w to compute u*A~ 1 w provides 
a quadratic error bound |48j . 

Since BiCG is advantageous for solving dual linear systems and we need to solve 
a sequence of such systems, we focus on Krylov subspace recycling for BiCG. We refer 
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to our recycling BiCG method as RBiCG. In addition, the BiCG algorithm forms the 
basis of other popular bi-Lanczos based algorithms like CGS [35] , BiCGSTAB [50] . 
QMR gl], and TFQMR [22]. Hence, the derivation of RBiCG is also useful for 
developing recycling variants of these algorithms [3J. 

The convergence of Krylov subspace methods for solving a linear system, to a 
great extent, depends on the spectrum of the matrix, and the deflation of eigenval- 
ues close to the origin usually improves the convergence rate [37l [47] . If the Krylov 
subspace is augmented with an eigenvector, then the associated eigenvalue is effec- 
tively deflated. Likewise, for BiCG, it can be shown that if the dual Krylov subspace, 
K l (A^*, fo), is augmented with left eigenvectors, the corresponding right eigenvec- 
tors are removed from the primal residual (and vice versa if the primal Krylov subspace 
is augmented with right eigenvectors) |15) . Therefore, while solving a pair of systems, 
we select approximate left- and right invariant subspaces of A^' (corresponding to 
small eigenvalues in absolute value), and use these to accelerate the solution of the 
next pair of systems. This process is called Krylov subspace recycling, and leads to 
faster convergence for the next pair of systems. 

For solving a single linear system, 'recycling' has been used in the GCROT [16] 
and the GMRES-DR 37 algorithms. For solving a sequence of linear systems, this 
idea was first proposed in [38] . where it is applied to the GCROT and the GCRO- 
DR algorithms. Recycling techniques are adapted to short term recurrences in the 
RMINRES [53] algorithm; see [36] for an improved version. GCROT as in [38 , 
GCRO-DR, and RMINRES all focus on solving a sequence of single systems rather 
than a sequence of two dual systems, which is the focus here. For a fixed matrix 
with multiple right hand sides deflation-based approaches are proposed in [3] [T] . For 
a comprehensive discussion of recycling algorithms see [55] . 

In addition to testing RBiCG for IRKA [29] for interpolatory model reduction, 
we test RBiCG for a model convection-diffusion problem. PDEs of this type are 
pervasive in science and engineering, they lead to nonsymmetric matrices for which 
BiCG may be well-suited, and they provide well-known test cases that are easy to 
reproduce and to analyze further. Convection-diffusion problems arise, for example, 
in the Oseen problem (a fixed-point linearization of the Navier-Stokes equations), in 
chemically-reacting flows, heat flow in a medium with transport, and so on. Moreover, 
any large discretized PDE leads to a potential model reduction problem, for example, 
for uncertainty quantification, for optimizing an engineering process, or indirectly 
estimating parameters in the model using measurements. We analyze the generated 
recycle spaces for both test problems, and we show up to 70% reduction in the iteration 
count. For our model reduction test problem, using BiCG instead of RBiCG would 
take approximately 50% more time to generate the reduced order model. As recycling 
is not needed for every pair of linear systems, this means that the improvement in 
time for those systems where recycling is actually used is substantially larger (see 
section 6). 

To simplify notation, we drop the superscript j in (jl.ip . At any particular point 
in the sequence of systems, we refer to Ax = b as the primary system and A*x = b as 
the dual system. Throughout the paper, || ■ || refers to the two-norm, (•,•) refers to 
the standard inner product, and is the i-th canonical basis vector. Unless otherwise 
stated, we refer to the primary system recycle space and the dual system recycle space 
collectively as the recycle space. 

In the next section, we briefly discuss the BiCG algorithm, and in section [3] we 
derive the RBiCG algorithm using a previously computed recycle space. How to 
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compute or update such a recycle space efficiently is discussed in section 2] After 
explaining the basics of interpolatory model reduction, we discuss how RBiCG is 
applied in IRK A in section [5] We present numerical experiments and results in section 
[6] and conclusions in Section [JJ 

2. The BiCG algorithm. For the primary system, let xq be the initial guess 
with residual r$ = b — Axq. Krylov subspace methods, in general, find approximate 
solutions by projection onto the Krylov subspace associated with A and r$ [ST]. The 
i-th solution iterate is given by 

Xi=x + ft, (2.1) 

where Qi £ K' l (A, r ) = span{ro, Ar , A 2 r , • • • , A l ~ 1 r } is defined by some projec- 
tion. The BiCG method defines this projection using the Krylov subspace associated 
with the dual system, leading to two bi-orthogonal bases and a pair of three-term or 
coupled two-term recurrences. This method is called the bi-Lanczos method [34l [20] . 
We initialize the Lanczos vectors as follows: 

V\ = ro/||r ||, v\ = ro/\\r \\. 

Defining Vi — [vi V2 ■ ■ ■ Vi] and Vi = [vi v<i ... Vi], the (i + l)-th Lanczos vectors are 
given by 

jv i+ i = Avi - ViT J_ V i} 7^+1 = A*Vi - Vif _L Vi, 

where the scalars 7 and 7 and the vectors r and f are to be determined. This 
bi-orthogonality condition leads to a pair of 3-term recurrences (see [S]), so that 
computation of the (i + l)-th Lanczos vectors requires only the i-th and the (i — l)-th 
Lanczos vectors. These 3-term recurrences are called the bi-Lanczos relations, and 
they are defined as follows: 

AVi = V l+ iT t = ViTi + t i+1 ,iV i+1 ef, 
A*Vi = Vi+xtt = Vf t + t l+hl v l+1 ef, 

where Ti are i x i tridiagonal matrices, it+i,» is the last element of the last row of 
T. e c( i+1 ) xi , and k+x,i is the last element of the last row of T_ L e C( i+1 ) x \ 

The next step is to find approximate solutions by projection. To exploit the effi- 
ciency of short-term recurrences in the bi-Lanczos algorithm, we use the bi-orthogonality 
condition to define the projection. This leads to a Petrov-Galerkin approach. Since 
the columns of Vi form a basis for K l (A,r ), we can define Qi in (|2.ip as Qi = ViUi, 
and the bi-orthogonality (or Petrov-Galerkin) condition then implies 

n = b - A(x a + Qi) =r - AViUi ± Vi. 

The vector yi is defined by this orthogonality condition. The solution iterate for 
the dual system, Xi, is similarly defined by Xi — xq + Vijji and Pi J- Vi. Further 
simplifications lead to the standard BiCG algorithm (Algorithm 1) [20l [5Tj . 

Next, we briefly discuss the breakdown conditions in BiCG and their reme- 
dies |26l I51j . The first breakdown happens when, at any step i, r*ri — 0. This 
is a breakdown in the underlying bi-Lanczos algorithm and is referred to as a serious 
breakdown. There exist so-called look-ahead strategies [30] to avoid this break- 
down. In addition, the two-term recurrence for the solution update requires a pivotless 
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Algorithm 1. BiCG (adapted from '51]) 

1. Choose initial guesses Xq and Xq. Compute ro = b — Axq and f o = 6 — A*xq. 

2. if (ro, fo) = then initialize So to a random vector. 

3. Set po = 0, po = 0, and /3q = 0. Choose tol and max_itn. 



4. 


for i = 


= 1 . . . max_itn do 


o 


Pi = 


n-i + pi-iPi-i. 


o 


Pi = 


fi-l + Pi-lPi-l> 


o 


1i = 


A Pl . 


o 


Qi = 


A*p t . 


o 


a-i = 


(h-i,ri-x)/(pi,qi 


o 


X j 


Xi_i + CtiPi. 


o 


Xi — 


Xi-i+ a-ipi. 


o 


Ti = 


n-i - aiQi. 


o 


fi = 


h-i - &m- 



o if ||rj|| < tol and \ \fi\\ < tol then break. 

o ft = (ri,n)/(fi-x,ri-i). 
5. end for. 



LDU decomposition of the tridiagonal matrix Tj, which may not always exist. This 
breakdown is referred to as a breakdown of the second kind, and it can be avoided 
by performing the LDU decomposition with 2x2 block diagonal elements [8]. The 
breakdown conditions in RBiCG are the same, and similar solutions can be applied. 
Therefore, and for the sake of brevity, we do not discuss breakdowns for RBiCG sepa- 
rately, and we'll assume henceforth in our derivations that breakdowns do not occur. 
Note that extensive experiments show that BiCG works well, and that breakdowns 
rarely happen in practice [HI [30] . 

3. Recycling BiCG: Using a Recycle Space. In this section, we modify the 
BiCG algorithm to use a given recycle space. First, we briefly describe the recycling 
idea used in the GCRO-DR algorithm. After solving the j-th primary system in 
(jl.ip . GCRO-DR computes the matrices U, C G C" xfc , such that range(fj) is an 
approximate invariant subspace of A® , A^ +1 ^ U = C and C*C — I. It then computes 
an orthogonal basis for the Krylov subspace K l ((I — CC*) A, (I — CC*) ro). This 
produces the Arnoldi relation 

AVi - CC*AVi + Vi+iEi ^ (I- CC*)AV t = V l+1 H t , 

where H_ i is an (i + 1) x i upper Hessenberg matrix. GCRO-DR finds the residual- 
minimizing solution over the (direct) sum of the recycle space, range([7), and the new 
search space generated, range(l/;). 

In RBiCG, we use the matrix U, derived from an approximate right invariant 
subspace of A^ , to define the primary system recycle space, and compute C = 
A^ +1 ^U. Similarly, we use the matrix U , derived from an approximate left invariant 
subspace of A^\ to define the dual system recycle space, and compute C = A^ +1 '*U. 
Instead of C being an orthogonal matrix, U and U are computed such that C and 
C are bi-orthogonal; see Section 14.31 The number of vectors selected for recycling is 
denoted by k, and hence, U, U, C, and C £ C nxk . Next, we derive an augmented bi- 
Lanczos algorithm that computes bi-orthogonal bases for the primal and dual Krylov 
subspaces. The two-term recurrence for the solution update in RBiCG is derived in 
Section O 
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3.1. The Augmented Bi-Lanczos Algorithm. The standard bi-Lanczos al- 
gorithm computes columns of Vi and V such that, in exact arithmetic, Vi _!_b V, 
where J_b denotes bi-orthogonality; this implies that V*Vt is a diagonal matrix. Since 
we recycle spaces U and U, the bi-Lanczos algorithm must be modified to compute 
the columns of Vi and V such that either 



or 



[U V] U [U Vi 
[C V,] L b \c V 



(3.1) 
(3.2) 



We choose to implement (|3.2j) . because it leads to simpler algebra and hence a more 
efficient algorithm. It also has the advantage that the RBiCG algorithm has a form 
similar to the standard BiCG algorithm. Next, we derive the recurrences that im- 
plement (I3.2j) . where C -Lb C has already been satisfied. The latter relation is easy 
to implement when computing the recycle space. Indeed, we can compute C and C 
such that C*C is a real, positive, diagonal matrix; see Section 4.3. As in the BiCG 
algorithm, we assume v% and v\ are available from the initial residuals Vq and ro- 
We make this statement more precise below. The (i + l)-th Lanczos vector for the 
primary system is computed by 



= Avi - ViT - Cp _l_ 



CV 



(3.3) 



where 7, r, and p are to be determined. Combining 
following equations, 



C*Av t 
VtAvi 



d33) and dH3), we get the 



(3.4) 



where T>i = V*Vi and T> c = C*C are both diagonal matrices and V c has real, positive 
coefficients (see Section l4~3l) . As discussed before, a breakdown in the standard BiCG 
algorithm because of singular T>i can be fixed with look-ahead strategies. Assuming 
breakdowns do not occur, we can solve for r and p in Q3.4[) and choose a normalization 
7; substituting these into (|3.3I) gives the (i + l)-th Lanczos vector. Because of the 
bi-orthogonality condition (|3. 21) . the full recurrence for Vi+i reduces to a (3 -I- fc)-term 
recurrence, where k is the number of columns of C. This implies that the computation 
of the (i + l)-th Lanczos vector requires the i-th. and (i — l)-th Lanczos vectors and 
C . Similarly, we get a (3 4- fc)-term recurrence for computing the Lanczos vectors for 
the dual system. We refer to this pair of (3 + fc)-term recurrences as the augmented 
bi-Lanczos relations; they are given by 



(/ - CC*)AVi - Vf+iTU, (I - CC*)A*Vi = V+xti 



where 



C = 
C = 



5J Cl 



c 2 



c 2 

<"'<■_» 



Cfc 
CfcC fc 

Ck 



= cv-* = cv~ x , 



= CV: 



Using (|3.2p . we can rewrite (|3.5I) as 

AiVi = Vi+iTj, where A x = (I - CV- l C*)A(I - CV^C*), 
A\Vi = Vi+iZi, where A* = (I — CV~*C*)A*(I - CV-*C*), 



(3.5) 



(3.6) 



(3.7) 



() 



AHUJA, DE STURLER, GUGERCIN, CHANG 



since C*V{ — and C*Vi = 0. This new form of the augmented bi-Lanczos relations 
simplifies the derivation of the recurrence for the RBiCG solution update, because 
the operators (I3.7[) are each other's conjugate transpose. Note that the additional 
orthogonalizations in (13. 7[) need not be carried out in an actual algorithm (see Algo- 
rithm [372]). 



3.2. The Solution Update for the Augmented Bi-Lanczos Recurrence. 

The i-th solution update in the RBiCG algorithm becomes 



xq + U Zi + V i y l , Xi — x + USi + Vifii, 



(3., 



With recycling, the bi-orthogonality condition (13.21) defines the Petrov-Galerkin con- 
dition, 



n = r - AU z l - AViyi -L 



cv,. 



r i = r -A*Uz i -A*V i y i ±[CV i }. (3.9) 



For the remainder of this section, we focus on the primary system. The derivations for 
the dual system are analogous. The computation of Zi and yi can be implemented more 
efficiently than (EU) suggests. Defining C = \\(I-CC*)r \\ and vi = CH 1 ~CC*)r , 
we get 



r = CC*r + (I - CC*) r = [C V i+1 ] 
Using the augmented bi-Lanczos relation (j3.5[) we get 



C*r 



[C Vi+i] 



A[U V,] 

Substituting (13T01) and f3~TT]) in (|3T9j) gives 
[C V l+1 



I C*AV X 
Ti 



C* 



C*r 
Cei 



I C*AVi 
T s 



Zi 

Mi 



= 0. 



Using the bi-orthogonality condition Q3.2p in the above equation we ge10 

= 0. 



" C*r ' 




Cei 





/ C*AVi 
Ti 



(3.10) 



(3.11) 



(3.12) 



(3.13) 



(3.14) 



Therefore, yi and Zi are given by 

TiVi = Cei, 
Zi = C*r -C*AV iyi . 

Substituting (13.141) in (13.81) leads to the following solution update: 

x t = x + UC*r + (J - UC*A)V iyt , 

where yi is obtained from solving Tyi — £e±. All computations here are done with 
matrix- vector products and UC*A is not computed explicitly. 



1 Note that the length of the vector e\ in I I3.13H is one less than that of ei in I I3.12H . although 
both denote the first canonical basis vector. Also, Ti in l|3.13jl is T_ i without the last row, and hence 
is an i X i tridiagonal matrix. 
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Algorithm 2. RBiCG 

1. Given U and U compute C and (7 using (|3.6p . If £/ and [/ are not available, then 
initialize U, U, C, and C to empty matrices. 

2. Choose X-i, i_i and compute xq, xq, r$, and fo using (|3.15p . 

3. if (ro,fo) = then initialize x_i to a random vector. 

4. Set po = 0, po = 0, and @q = 0. Choose tol and max_itn. 



5. for i 


= 1 . . . max_itn do 






Pi 




Pi 


= fj_i + 


Zi 


= Apr, 




= A*p l 


* Ci 


= C* Z{; 


Ci 


= C* Si; 


* <7i 


— z i — CQi] 


<ii 


— Si — CQ 


O tti 


= (fi-i,ri-i)/(pi,qi); 


Oii 


= Oii 


* Cc 


= Cc + aid; 


Cc 


= Cc + <SiO 


O 


= Xi-l + OLiPi 


X i 


= Xi-i + OtiPi 


O Ti 


= Ti-i - aiqi 


h 


= fi-i - onqi 



° if I kill — to l an d |^i|| < "tol then break 

o p i = {r i ,r i )/(r i -\,r i -i) 

6. end for 

7. ^i — UC,c> Xi — U (^c 



We introduce a slight change of notation to make future derivations simpler. Let 
X-\ and X-i be the initial guesses and r_i = 6 — Ax-\ and f_i = 6 — A*£_i the 
corresponding initial residuals. We define 

i = i-i + C/C'r_i, r = (/- Cd*)r- lt , 

x = x-i +Ufr?-i, r = (i - C&)?-i, { ' ' 

and follow this convention for xq, xq, t*o, and fo for the rest of the paper. Let 

Ti — LiDiRi, 

Gi = {I -UC*A)V l R7\ 

<Pi = (D- 1 Lr 1 e 1 . 

As in the standard BiCG algorithm, an LDU decomposition (without pivoting) of Ti 
might not always exist. We can avoid this breakdown in the same way as done for 
BiCG (see Section[2]). The two-term recurrence for the solution update of the primary 
system is now given by 

Xi = Xi-i + PijGiei for i > 1, 

where ipij is the last entry of the vector ipi, and xq is given by (|3.15p . An analogous 
update can be derived for the dual system. Note that we never compute any explicit 
matrix inverse. The matrices under consideration, Di, Li, and Ri, are diagonal, lower 
triangular, and upper triangular respectively. 

This two-term recurrence can be simplified such that Ti is not needed explicitly. 
To derive further simplifications, we use the operator A\ (instead of A) and follow 
steps similar to the ones used in the derivation of BiCG [30] . Algorithm 2 provides an 
outline of RBiCG. Some algorithmic improvements to make the code faster are not 
given here; see [3] for further details. 
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4. Recycling BiCG: Computing a Recycle Space. We use the matrices U 
and U to define the primary and dual system recycle spaces. The recycle space used 
in solving a linear system is fixed throughout the RBiCG iteration; however, the basis 
of the recycle space for the next pair of linear systems is updated periodically using 
the bi-Lanczos vectors. We use harmonic Ritz vectors, with respect to the current 
Krylov subspace, to approximate left- and right invariant subspaces cheaply. 

We use the following definition [44]. Let S be a subspace of C™. Then A € C is a 
harmonic Ritz value of A and u G S ^ its corresponding harmonic Ritz vector with 
respect to the subspace W = AS if 

{Au - Am) 1 AS. (4.1) 

In Section 14. 1( we derive a small generalized eigenvalue problem whose solution gives 
the desired approximate invariant subspace. The first pair of systems in our sequence 
of dual linear systems requires special attention, since there is no recycle space avail- 
able at the start. We discuss this case in Section 1431 In Section l4~3l we describe the 
construction of the bi-orthogonal C and C in (|3.2p such that T> C — C*C has positive 
real coefficients. Although, the generalized eigenvalue problem derived in Section |4~T1 
is of a small dimension, it would be expensive to set up in a straightforward manner. 
We show how to set up the problem efficiently using recurrences in Section 14.41 

4.1. Computing an Approximate Invariant Subspace. We need a sequence 
of consecutive Lanczos vectors Vi and x>i and tridiagonal matrices and Tj to build 
the recycle space. There is a degree of freedom in choosing the scaling of the Lanczos 
vectors [26l SOI E] . The following scaling yields % = T* (using fl3J|) and (|4~2]) ): 

|M| = 1, {vi,Vi) = l. (4.2) 

Hence, the Lanczos vectors are computed as follows: 

n-x _ ri_i 

Vi = Ti TP Vi = 7 = — \ ■ 

|r-i|| \Vi,ri-x) 

Ti can be computed using the residuals and iteration scalars of the RBiCG iteration 
as follows [31 13] : 

/ J_ _lkoll . Pi \ 

CKi ||n|| Ql 

.Unii . J_ J_ + Ok, IM §1 

\\ro\\ ai ct2 ai 1 1 1™2 1 1 a2 



T t = 



all ft 



I In-ill 

, lki-i|| i J_ + ft^i . 

\ ||r»_2|| cti-i on m-i / 

Instead of using all the Lanczos vectors to update the recycle space, we update the 
recycle space periodically. This strategy keeps the memory requirements modest |53j , 
as it allows us to discard Lanczos vectors periodically. The iteration process between 
two updates of the recycle space is referred to as a "cycle" . The length of the cycle, 
s, refers to the number of iterations between updates. Let Vj and Vj contain the 
Lanczos vectors generated during the j cycle, 

V i = • ■ • v js] , Vj = [v(j-i) s +i ■ ■ ■ v 3 s] ■ 
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Also, let 



T, = [t 



T — 



' J (j-l)s 



where Uy. 



1)8 



and 



\j-l)8 



are the last Lanzos vectors from the previous cycle, and 



'Jjs+l 



and Vjs+i are the first Lanzos vectors from the next cycle. The augmented 



bi-Lanczos relations for the j cycle are now given by 

(/ - CC*)AVj = T 3 r i; (I - CC*)A*V 3 = f j f 



3 ' 



(4.3) 



where Tj, f j G (£(«+2)xs are y>^ respectively, with an extra row at the top (cor- 
responding to V(j_u s and V(j_i) s ) and at the bottom (corresponding to Vj S+ \ and 

Vjs+l)- 

The discussion in this paragraph concerns only the primary system. However, 
an analogous discussion applies to the dual system. Let U define the recycle space 
available from the previous linear system and Uj-i the recycle space generated at the 
end of cycle (J — 1) for the current linear system. We want to obtain an improved 
Uj from Vj, Uj-i, and U. It is important to note that Uj is not used for solving the 
current linear system. At the end of solving the current linear system, the final Uj 
will be U for the next linear system. There are several choices for selecting Uj [S3]. 
For simplicity, we build Uj from range([f7j_i Vj]). 

Based on the choices discussed in the previous two paragraphs, we first define 
certain matrices, and then we derive the generalized eigenvalue problem whose solution 
gives the approximate invariant subspace. Let 



*J = Pj-i Vj] , % = [C Cj. x Tj] , Hj = 



*,= Ui-iVj , = C Cj-x Tj , H. 



Bj 

1 

o r, 

Bj 

1 
f 7 



where Cj-i = AUj-i, Bj = C*AVj, Cj-x = A*Uj-i, and B 3 = C*A*Vj. Then, the 
augmented bi-Lanczos relations f)4.3[) lead to 



A® 4 



^Hj, 



A* $4 



(4.4) 



In RMINRES [S3J , harmonic Ritz pairs of A with respect to the subspace range( A&j ) 
have been successfully used to build the recycle space. Since we work in a Petrov- 
Galerkin framework, it is more intuitive to use harmonic Ritz pairs with respect to 
the subspace range(-i4*$j), following 9 . This leads to simpler algebra and cheaper 
computations. Let (A,u) denote an harmonic Ritz pair of A. Then, we derive A and 
u G range(<I>j) from the condition 

(Au - Ait) .I range \A*®A . (4.5) 

Taking u = &jW and substituting (|4.4I) in (j4.5[) gives 

(A**X A<P jW = A U*$X $ jW o (*jHj) * *jHjW = A * $jW. 
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Thus, condition 



leads to the generalized eigenvalue problem, 



H*^ 3 H 3 w 



j j J 



(4.6) 



Let the columns of Wj be the k right eigenvectors corresponding to the eigenvalues 
closest to the origin. Then, we take Uj = <S>jWj. See [3] for an analogous derivation 
of the dual system recycle space. 

4.2. The First Linear System and the First Cycle. For the first cycle of 
the first system, the matrices U, U, Uj-\, and Uj-\ are not available. Letting T\ 
and Ti denote the tridiagonal matrices for the first cycle, we consider the following 
eigenvalue problems: 



T\w = Aw, 



T\W = fJLW. 



Since T\ = T* , we solve for the left and the right eigenvectors of T±, W\ and Wi 
respectively. Hence, we take 



u 1 = v 1 w 1 , 



Ui = ViWi. 



During the second and subsequent cycles of the first linear system, Uj—\ and 
Uj-i are available, but C and C are not. Redefining ^ 3 ,^ 3 ,H 3 , and Hj, we get the 
generalized eigenvalue problem (|4.6p with 



*j = [Uj-i V 3 ] 



*i = 



u 



J-l 



*i = 

Cj-i 



Ti 



H 3 = 






f , 



For the first cycle of each of the subsequent linear systems (i.e. j — 1), C and C 
are available, while U 3 -\ and U 3 -\ are not. Redefining $i, 4>i, ^i, i&i, Hi, and Hi, 
we get the generalized eigenvalue problem (|4.6[) with 



$1 



[U V{\ 
U Vi 



*1 
*1 



[C Zi] 



Hi 
#i 



£i 



where and t£i denote [Vi u s +i] and [Vi Sg+i] respectively. 

4.3. Constructing Bi-orthogonal Cj, Cj and C, C We need to compute 
the matrices Cj and Cj such that Cj Lb Cj at the end of each cycle. After solving 
the generalized eigenvalue problem (|4.6j) . we set (as initial choice) Uj = QjWj, Uj = 
3>j Wj , = AUj , and Cj = A* U 3 , and we compute the SVD 

C*Cj = MjVjN*, (4.7) 

such that a i > oi > ... > <jfc > 0. Given some tolerance tol > 0, we pick p 
such that a v > tol > a p +i (with both p — k and p = possible), and redefine 
Mj = [mi, . . . ,m p ] and Nj = [n 1; . . . , n p ], where rrii and rij are the left and right 
singular vectors corresponding to er^. Next, we redefine 

Uj = <S> 3 W 3 N 3 = [Uj^ V 3 ]W 3 N 3 , Uj = QjWjMj = [Uj^ V 3 ]W 3 M 3 , 
Cj = AU 3 = A[Uj^ V 3 ]W 3 N 3 , Cj = A*U 3 = A* [Uj-i V 3 ]W 3 M 3 . 



(4.8) 



RECYCLING BICG 



11 



By construction C*Cj is diagonal with real, positive coefficients Q 

Analogous to the above, at the start of each linear system (after the first), we need 
to compute C and C such that C J_b C, cf. (|3.2p . and the diagonal matrix T> c — C*C 
has real, positive coefficients. Taking initially for U and U the final matrices Uj and 
Uj from the previous pair of linear systems, we compute C = AU, C = A*U, and 
compute the SVD C*C — M£iV*. After this we proceed as for the computation of 
Cj and Cj. 

4.4. Efficiently Setting up the Generalized Eigenvalue Problem. The 

main cost of setting up the generalized eigenvalue problem (|4.6[) is in computing the 
matrices 





- c * 














^ J ^3 


c*_ 


l 






Cj-i 


Ti ] = 








L i 





















C* ' 








i o 








[ Uj-i 


*S ] = 


Cl-iUj 










3 - 






L t;^. 


i i 



C" 







where 7 is the s x s identity matrix with an extra row of zeros at the top and at 
the bottom. Most of the blocks in these matrices can be constructed efficiently by 
exploiting recurrences and various algebraic relations. 

The bi-orthogonality condition (I3.2j) and the construction of Cj and Cj (|4.7[) - (l4.8p 
give the following orthogonality relations. 



C 3 -2 A- T, 



C 



J-2 



_L To 



(4.9) 



Next, going from top-to-bottom and left-to-right, we analyze each block of ^j^j and 
SfrjQj in terms of its defining recurrences and simplify it using (pOi) , (|4.3|l . (|4.7p . (|4.8p . 
and (|4.9p . Blocks whose efficient computation is obvious or has already been detailed 
are skipped, and we focus on computations that are at least 0(n). 



&C.J-X = C*AUj- X = [ C*AUj_ 2 C*AVj-i } Wj-iNj. 



C*C 3 -2 C* (c6*AVj-i + Tj-xr^x) 
C*Cj- 2 V c Bj^ } Wj-xNj-x, 



Wj-iN. 



For this first block, we describe its efficient computation in some detail. The cost 
of computing C*Cj~i by direct multiplication is 0(k 2 n); so, it would be expensive. 
However, the submatrix C*Cj_2 is available from the previous cycle, and T> c is a 
diagonal matrix independent of the cycle (so both must be computed at most once 
per linear system). Furthermore, Bj-x has been computed during the (augmented) 
bi-Lanczos iteration. Finally, the matrix-matrix product [C*Cj-2 T> c Bj_x]Wj-iNj-x 
does not involve any 0(n) operation. Hence, this block can be computed quite cheaply. 
We give a brief overview of the cost of the RBiCG algorithm in Section [6] for a more 
detailed derivation, see [4]. 



c* c = m* r x 



c;_ 2 c 



2 For p = 0, no recycle space would be selected. This has never occurred in our experience. 
Indeed, discarding even one pair of vectors is rare. 
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The derivation of this block is similar to the previous block. As above, C*_ 2 C is 

available from the previous cycle, and Bj-i has been computed during the bi-Lanczos 
iteration. 



C*_ 1 T j = U*_ 1 AT j = MJ^WjU 



= M*_ X W*_ X 



TT* 
V* , 




AT 3 = M^W*^ 







CC*A*V,- Y + T,-_ir,-_i 







where 



V h-2)s+l 



V 



0-1)5+1 



[ "(3- 



l)s u (j-l)s+l 





" 







" 


= f*_ 1 















1 












_ 


1 




_ 


[ fjTj-_ 


ir, 


-1 ] 










" 





... 1 












... o 












... o 



where 



••• 
The derivation of this block is similar to the previous block. 

. cruj-! = c* [ Uj-2 Vj-i ] Wj^Nj-! = [ c*Uj- 2 o ] w^n^, 

where C*Uj-2 is available from the previous cycle (such a block must be computed 
at most once per linear system). 



= M*_ 1 W*_ 1 



TT* 

V* , 
.7-1 



cuvj-i 



^ j — 2 v 3 

V^Cj-2 Vf^AVj-! 



W,-_iiV 7 --i, 



where Cj_ 2 Uj-2 and C*_ 2 Vj-\ arc submatrices of \I>j_ 1 <I>j_i, Vj'_ 1 Cj-2 is a submatrix 
of T^_ 1 Cj_2 and is available from ^_ 1 *_ 7 -_i, and V*_ 1 AV : j-i = Tj-i. 

• C*_ x Vj is a submatrix of Cj^Yj, and hence, is available from 

Therefore, only TjUj-i needs to be computed explicitly. 
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5. Model Reduction. Consider a single-input/single-output (SISO) linear time- 
invariant (LTI) system represented as 

G . = A ^ + hv{t) or GOO = c*(sE - A)-H (5.1) 

\y(t) = c*x(t), 

where E,A e R nxn and b,ce R". The time-dependent functions v(t), y(t): R -» M 
are the input and output of G(s), respectively, and x{t) : R — J> R™ is the associated 
state. In (I5.ip . G(s) is the transfer function of the system: Let V(s) and Y{s) denote 
the Laplace transforms of v(t) and y(t), respectively. Then, the transfer function 
G(s) satisfies Y(s) — G(s)V(s). By a common abuse of notation, we denote both the 
underlying dynamical system and its transfer function with G. The dimension of the 
underlying state-space, n, is called the dimension of G. Systems of the form f|5. 1[) 
with extremely large state-space dimension n arise in many applications; see [6] and 
[3"5] for a collection of such examples. Simulations in such large scale settings lead 
to overwhelming demands on computational resources. This is the main motivation 
for model reduction. The goal is to produce a surrogate model of much smaller 
dimension which provides a high-fidelity approximation of the input-output behavior 
of the original model G. Let r <C n denote the order of the reduced-model. The 
reduced-model is represented, similar to (|5.ip . as 

j E r x r (t) = A r x r (t) + b r v(t) 
°r(s) ■■ < , , ' or G r (s) = c*(sE r - A r ) L b r (5.2) 

[yr{t) = c r x r (t), 

where E r ,A r S R rxr and b r ,c r 6 R r . In this setting, the common approach is 
to construct reduced order models via a Petrov-Galerkin projection. This amounts 
to choosing two r-dimensional subspaces V r and W r and matrices V r G R nxr and 
W r € R" xr such that V r — Range(K-) and W r = Range(W / r ). Then, we approximate 
the full-order state x(t) as x(t) « V r x r (t) and enforce the Petrov-Galerkin condition, 

W* (EV r x r {t) - AV r x r {t) - bv{t)) = 0, y r (t) = c*V r x r {t), 

leading to a reduced-order model as in (|5.2[) with 

E r = W*EV r , A r = W*AV r , b r = W*b, and c r = V; C . (5.3) 



As (|5.3p illustrates, the quality of the reduced model depends solely on the selection 
of the two subspaces V r and W r . In this paper, we will choose V r and W r to enforce 
interpolation. For other selections of V r and W r , we refer the reader to [6]. 

5.1. Interpolatory Model Reduction. For a given full-order model G(s), the 
goal of interpolatory model reduction is to construct a reduced-order model G r (s) via 
rational interpolation. Here, we focus on Hermite interpolation. Given the full-order 
model (15.11) and a collection of interpolation points (also called shifts) a.- L € C, for 
i = 1, . . . , r, we must construct a reduced-order system by projection as in (15. 3[) such 
that G r (s) interpolates G(s) and its first derivative at selected interpolation points, 
i.e., 

G(cTi) = G r (<Ji) and G'(tJi) = G' r (<Ji) for i = l,,..,r. 

Rational interpolation by projection was first proposed in [T71 [S51 [SB] . How to obtain 
the required projection was derived in |27j using the rational Krylov method [40] , For 
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the special case of Hermite rational interpolation, the solution of the interpolatory 
model reduction problem is given in Theorem 15. II For the more general case, we refer 
the reader to [27J and the recent survey [7J. 

Theorem 5.1. Given G(s) = c* {sE — A)^ 1 b and r distinct points ax, . . . , <j r € C, 

let 



V r = [(a 1 E-A)- 1 b...(a r E-A)- 1 b], 



w: 



c*{a\E — A)~ v 
c*{a r E-A)-\ 



(5.4) 



Using (5.3\) . define the reduced-order model G r (s) — c*(sE r — A r )~ 1 b r . Then G{ai) = 
G r ((Ti) and G'(<7j) = G' r (o~i), for i = l,...,r, provided that aiE — A and aiE r — A r 
are invertible for i = 1, . . . , r. 

Theorem 15.11 shows how to solve the interpolatory model reduction problem via 
projection for given shifts. However, it does not provide a strategy for choosing good/ 
optimal interpolation points. Recently, this issue has been resolved for the special case 
of optimality in the H,i norm )29j . The H.<i norm of the dynamical system G(s) is 
defined as 



I CI 



H-2 



1 

2^ 



GO) | 2 da 



1/2 



The %2 norm of G is the 2 — oo induced norm of the underlying convolution operator. 
Then, for any v G I? (M + ) , — y r ||,L°° < ||G — G r \\-H 2 IMIl 2 - To ensure that the output 
error y — y r is small in L°°(R + ) uniformly over all inputs v, say, with ||w||l 2 < 1) we 
seek a reduced system G r that makes \\G — G r ||-H 2 small. This leads to the optimal 
H2 model reduction problem: Given G(s), and a reduced order r < n, find G r (s) that 
solves 



\G-G r 



\H 2 



mm 



G G> 



(5.5) 



This problem has been studied extensively [35j EU EHl SSI E2 [28l UHl HI]- It is a 
non-convex optimization problem, which makes finding the global minimum, at best, 
a hard task. Hence, the common approach is to construct reduced-order models 
that satisfy, for an interpolatory model reduction framework, the following first-order 
necessary conditions. 

Theorem 5.2. (JM W$) Given G(s), let G r {s) = c* r {sE r - A^br be an H 2 - 
optimal reduced order model of order r, and let Ai, . . . , A r denote the poles of G(s). 
Then 



G(-Xi) = G r (-Xi) and G'(-Ai) = G' r (-k) for i = l,...,r. 



(5.6) 



So, the U.2 optimal approximant G r (s) is a Hermite interpolant to G(s) at the mirror 
image of its poles. These poles, the optimal interpolation points, are not known a 
priori. Hence, the iterative rational Krylov algorithm (IRKA) [25], starting from an 
initial selection of interpolation points, iteratively corrects the interpolation points 
until (|5.6[) is satisfied. Algorithm 3 outlines IRKA; for details, see [29] , 



5.2. Advantages of Approximating Solutions using a Petrov-Galerkin 
Framework in Interpolatory Model Reduction. The main cost in IRKA is solv- 
ing multiple linear systems to compute V r and W r . If the dimension of the state-space, 
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Algorithm 3. IRKA ( flffl j 

1. Make an initial shift selection o\ for i = 1, . . . , r, 

2. V r = [(fTiS - A)- 1 ^ . . . , (oy.E - A) _1 6] , 

3. W r = [((7iE - A)-*c, (a r E - A)-*c], 

4. while (not converged) 

o A r = W*AV r , E r = W?AV r , 

o <7, < Xi(A r ,E r ) for i = 1, . . . ,r, 

o K- = [((TiE - A)- 1 ^ . . . , (a r S - A)-^] , 
o W r = [{(ixE - A)~*c, (a r E - A)~*c], 

5. A r = W;AV r , E r = W;AV r , b r = W*b, c r = V*c r . 

n, is large, these systems are generally solved only approximately by an iterative solver. 
In this context, it is important to asses the accuracy of the computed reduced order 
model; that is, given the shifts, how accurately the Hermite interpolation problem 
is solved. This question was studied extensively in [TJ]. One of the major results, 
outlined below for our particular case, provides the main motivation for solving the 
linear systems associated with the corresponding columns of V r and W r as pairs of 
dual linear systems (in the terminology of Section [lj using BiCG or RBiCG. 

Let Vj and Wj, for j — 1, . . . , r, denote the approximate solutions of (ojE — A)vj = 
b and (<7jE — A)*Wj — c, respectively, with residuals rjj = (<7jE — A)iij — b and 
£j = {cjjE — A)*Wj —c. Furthermore, let Vj, Wj, rjj, and satisfy the Petrov-Galerkin 
condition that there exist spaces V and Q such that Vj € V, Wj € Q, rjj _L Q, and 
£j -L V . Define the approximate solution matrices (V r and W r ), the residual matrices 
(Rb and R c ), and the rank-2r matrix (F2 r ) as follows: 

V r = [i'l V2 ■ ■ ■ V r ], W r 

Rb = \r\\r]2 ■ ■ ■ Vr], Rc 

F 2r = Rbiw^Vr^w; + v*{w;v r )- x Ro- 

Also, define the inexact reduced-order order quantities 

A r = W*AV r , E r = W*EV r , K = W*b, and c r = V*c. 

Then, the computed reduced-order model G r (s) — c*(sE r — Ar)^ 1 ^ exactly interpo- 
lates the perturbed full-order model G(s) — c*(sE — (A + F 2r ))~ 1 b, i.e., 

G{a i ) = G r {a i ) and G'(^) = G' r (ai), for i=l,...,r. 

Hence, iteratively solving the linear systems while satisfying the Petrov-Galerkin con- 
dition above yields a backward error for the interpolatory model reduction that is 
bounded by ||f2r||, which is governed by the norms of the residuals. The latter are 
easily controlled in the iterative solver. For details, we refer the reader to [T2] . 

The easiest way to satisfy the Petrov-Galerkin condition above is by solving the 
dual pairs of linear systems using BiCG. Hence, BiCG is particularly suitable for 
solving the linear systems in IRKA (even for symmetric positive definite matrices). 
However, as IRKA leads to a sequence of dual linear systems, the RBiCG algorithm 
can be used to reduce the total run time for solving all linear systems. Moreover, 
if we solve the dual pairs of linear systems arising in IRKA by RBiCG, the Petrov- 
Galerkin condition is still satisfied. Hence, the resulting reduced-order model will be 
an optimal %2 approximation to a nearby full-order model. 



= [Wl W2 ■ ■ ■ w r \ 

= [66 ■■■&], 
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5.3. IRKA using RBiCG. IRKA usually converges rather fast [33]. Hence, 
after one or a few initial steps, the interpolations points from one step of IRKA to 
the next do not change substantially. Moreover, for many cases, the change of the 
(appropriately ordered) {cr,} from one column of V r (and W r ) to the next is also 
modest. Therefore, IRKA is expected to gain significantly from recycling. 

For the special case of E = I in (|5.1j) . alternative solution approaches might be 
advantageous, as one can solve the linear systems for multiple shifts at once [2TJ EH 
133 H5]. Combining these strategies with a Petrov-Galerkin framework does not seem 
complicated. Effective strategies for Krylov subspace recycling for solving systems of 
the type, (<JiI — A)vi = b, for multiple shifts at once, as well as for multiple right hand 
sides, was discussed in (32] ■ For most model reduction applications, however, E ^ I. 

There are three strategies for recycling Krylov subspaces in IRKA. For the first 
strategy, consider two consecutive steps of IRKA, say step m and m + 1 (iterations 
to and m + 1 of the while loop in Algorithm 3), with shifts crj , for i = 1, . . . , r and 

Vr {m) = [(a[ m) E - A)-*b, . . . , (4 m) E A)-% 

Wi m) = [(a[ m) E - A)~*c, (a r m) E - A)-*c\. [ ' ' 

at step 777 and with shifts cr. , for i = 1, . . . ,r and 

V r (m+1) = [(a{ m+1) E - A)- l b, . . . , (4 m+1) E - A)- l b], 
M/ r (m+1) = [(a ( ™ +1) E - A)~*c, (a r m+1) E - A)-*c] 

at step 777 + 1 of IRKA. One can recycle Krylov subspaces from the i th column of 
Vr™ 1 " 1 and Wr" 1 * 1 to the i th column of K' m+1 ' and wl m+1 \ That is, from solving the 
pair of linear systems 

/ ( m ) A\ ( m ) 7 / (m) A \^ ( m ) 

(a\ >E ~ A)v\ = b, (al >E - A) w\ = c, 
to solving the pair of linear systems 

/ (m+l) i \ (m+1) , / (m+1) ^ (m+1) 

(Uj E — A)v\ = o, (<t> E — A) w\ = c, 

where i — 1,2, ... ,r. This strategy for recycling strategy is useful when the change 
in a shift from one IRKA step to the next is small. 

For the second strategy, consider a single IRKA step. One can recycle selected 
Krylov subspaces from solving for one pair of columns of V r and W r to the next pair 
of columns across all the columns of the matrices V r and W r . In the third strategy, 
the first two recycling strategies are combined. We describe one such combination. 
Consider solving the system [a'f 1 ^ E — Ajv'f 1 ^ — b and its dual system. ^From a 
set of previously generated recycle spaces (distinguished by their shifts) , one can pick 
the recycle space from the system with the smallest relative change in a (and less than 
a relative tolerance) . This would ensure that the linear system from which the recycle 
space has been generated is close to the current one. A natural pool from which 
to pick the a defining the recycle space would be <r[ m \ . . . , a T m \ a^" l+1 \ . . . , a!-™i . 
The second and third recycling strategies are useful when the shifts at an IRKA step 
are clustered. 

For the experiments in this paper, r is small, and so the shifts at any particular 
IRKA step are spread far apart. Hence, we follow the first strategy. That is, for 
every shift, we recycle Krylov subspaces from one IRKA step to the next. In general, 
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the linear systems corresponding to the relatively large shifts converge fast, and so 
recycling Krylov subspaces is not useful for them. Therefore, we carry out recycling 
only for selected, small shifts. We give more details in Section 15721 

5.4. Previous Work in Recycling for Model Reduction. Recycling for in- 
terpolatory model reduction in the Galcrkin setting, i.e., with W r — V r , has been 
considered in [JJ] and [TS] - In this setting, there are no dual systems to solve, and 
therefore approaches based on GCR [THj and GMRES [H] are considered, respec- 
tively, for a sequence of (single) linear systems, as opposed to our approach based on 
BiCG for a sequence of dual linear systems. Also in other respects, the approach for 
improving the linear solver and the model reduction context are quite different from 
here. In P~J], the focus is on efficiently solving linear systems with a fixed coefficient 
matrix and multiple right hand sides (Ax^ = b^), recycling descent vectors (in 
GCR). Furthermore, the authors target model reduction with a single interpolation 
point but interpolating higher derivatives. 

6. Results. We first give a brief overview of the overhead in RBiCG. We focus 
on components with at least 0(n) cost, where n is the dimension of the linear system. 
Furthermore, k is the number of basis vectors in the primal (or dual) recycle space, 
and s is the number of iterations in a cycle. For every iteration, there is an extra cost 
of (8fc + 2)n flops, mostly from orthogonalizations. At the end of each cycle, there is 
an extra cost of (14fc 2 + 6ks + 16k + 4)n flops, mostly from setting up the generalized 
eigenvalue problem and computing biorthogonal Cj and Cj. Once per linear system, 
there is an extra cost of (10fc 2 + 28k + 14)n flops, mostly from computing biorthogonal 
C and C. A more detailed discussion of the overhead is given in [4] . Note that s and 
k are much smaller than n. For recycling to be beneficial, the savings in iterations 
should be sufficient to make up for the overhead. Further in this section, we show 
that the reduction in the number of iterations for (a pair of) linear systems may be 
as high as 70%. For our model reduction test problem, we show that computing a 
reduced model without recycling takes about 50% more time than with recycling. 

We test RBiCG on a convection-diffusion problem and on IRKA for interpolatory 
model reduction. All experiments are done using Matlab. 

6.1. Convection-Diffusion. To analyze RBiCG, we use the linear system ob- 
tained by finite difference discretization of the partial differential equation 

-{M x ) s - (A# v ) v + B(x,y)ti x = F : 

with A as shown in Figure RTTl fa). B(x, y) — 2e 2 ( x2+v ~\ and T = everywhere except 
in a small square in the center where T — 100 [50]; see Figure loTTT a') . The domain is 
(0, 1) x (0, 1) with Dirichlet boundary conditions 

<?(0, y) = 0(1, y) = §{x, 0) = 1, and &{x, 1) = 0. 

We use the second order central difference scheme with a mesh width of h — 1/64, 
giving a nonsymmetric linear system of 3969 unknowns. The convergence is similar 
for a problem that is four times larger. To enable further analysis, we give results for 
this smaller system size. The primary system right-hand side comes from the PDE. 
We take the vector of all zeros as the dual system right-hand side. In this case, we 
are concerned only about the primary system. 

To analyze RBiCG, we solve the (same) dual linear systems four times. The re- 
cycle space generated during the first run is used for solving the same dual systems 
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Fig. 6.1. RBiCG for a convection- diffusion problem, (a) The coefficients of the PDE. (b) 
Convergence for preconditioned RBiCG with s = 40 and k = 10 for the primary system solved four 
times to analyze convergence improvement as the recycle space improves. 



a second time, further improving the recycle space, and so on. This is a useful ap- 
proach for analyzing how well Krylov subspace recycling works, as it excludes the 
effects of changing matrices and of right-hand sides having different expansions in the 
eigenvector basis [55] • Hence, it provides an indication for reasonable sequences of 
systems how fast the recycle spaces converge and how much recycling approximate 
invariant subspaces is likely to improve convergence. For this experiment, we take 
s = 40 and k = 10. These are chosen based on experience with other recycling algo- 
rithms 38, 53] . The relative tolerance for RBiCG is taken as 10 -8 . The initial guess 
(for both systems) is a vector of all ones. The linear systems are split-preconditioned 
by a Crout version of the ILUT preconditioner with a drop tolerance of 0.05 [31]. The 
generated recycle spaces pertain to the preconditioned linear systems. 

Figure 16.1T b) shows the convergence improvement of RBiCG, as it solves the 
primary system multiple times. For the second run, the reduction in iterations is 
around 35%. The convergence improves further with each run. Next, we present a 
brief analysis of the generated recycle spaces. In Table 16.11 we give the cosines of 
the principal angles between the primary (dual) recycle space and the right (left) 
invariant subspace associated with the eight eigenvalues of smallest magnitude. As 
for the recycle spaces, the invariant subspaces are computed for the preconditioned 
operator. As the recycle space improves, the principal angles to tend to zero, and so 
the cosines tend to one. The table shows that with only a few runs, RBiCG accurately 
approximates increasingly larger subspaces of the invariant subspace. As a result, we 
see faster convergence for every new run. 

6.2. Model Reduction. Our test dynamical system is a semi-discretized heat 
transfer problem for determining the optimal cooling of steel profiles |39[ H3| [43] . We 
will refer to this model as the rail model [39] . The rail model has seven inputs and six 
outputs. Since we focus on SISO systems in this paper, we choose a SISO subsystem 
corresponding to the second input and sixth output. The rail model is available with 
1357, 5177, 20209, and 79841 unknowns, depending on the mesh size. 

As convergence tolerance for IRKA we use a relative change in the shifts of less 
than 10~ 6 . The matrices A and E of (|5.ip are symmetric negative definite and sym- 
metric positive definite (SPD), respectively. Since our shifts are real and positive at 
every IRKA step, (a^E — A) is always SPD. Nevertheless, RBiCG is advantageous 
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Primary System 


Dual System 


Start of 


Start of 


Start of 


Start of 


Start of 


Start of 


Run 2 


Run 3 


Run 4 


Run 2 


Run 3 


Run 4 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


0.9896 


1.0000 


1.0000 


0.9950 


1.0000 


1.0000 


0.3832 


1.0000 


1.0000 


0.9884 


1.0000 


1.0000 


0.1452 


0.9983 


1.0000 


0.7864 


0.9844 


1.0000 


0.0988 


0.9437 


0.9970 


0.6070 


0.9206 


0.8141 


0.0300 


0.1869 


0.9567 


0.4749 


0.4118 


0.4721 



Table 6.1 

Convergence of the recycle space for the convection- diffusion problem as measured by the cosines 
of the principal angles between the primary (dual) recycle space and the right (left) invariant subspace 
associated with the eight eigenvalues of smallest magnitude. 



here because of the backward error formulation discussed in Section 15.21 We carry 
out two sets of experiments that differ in the dimension, r, of the reduced models. 
We also vary the frequency of computing a recycle space, as a recycle space can be 
effective for multiple consecutive systems S3 an d updating it may be expensive. 

We implement the first recycling strategy from Section liOl for a few selected shifts. 
As for the convection-diffusion example, the recycling parameters s and k are chosen 
based on experience with other recycling algorithms 38 , 53 . If a pair of linear systems 
converges in fewer than s iterations, the recycle space is not updated, and we use the 
previous recycle space for the next pair of systems in the sequence. The relative 
convergence tolerance for the iterative solves and the tolerance for constructing Cj 
and Cj in Section l4~3l are taken as 10 -6 . The linear systems are split-preconditioned 
with an incomplete LU preconditioner with threshold and pivoting (ILUTP) [41] . The 
drop tolerance varies per problem to avoid ill-conditioning; see Figures 16.21 - [(TBI The 
initial guess of the preconditioned system is the solution vector from the previous 
preconditioned system in the sequence. For the first IRKA step, we take a vector 
of all zeros as the initial guess. In general, a better initial guess may be based on 
knowledge of the system and aim to avoid orthogonal initial residuals (Algorithm 1 
Step 2; Algorithm 2 Step 3). 

For the first set of experiments, we reduce the models to r = 6 degrees of freedom, 
with 1.00 x 10~ 5 , 1.38 x 10 -4 , 1.91 x 10~ 3 , 2.63 x 10" 2 , 3.63 x 10" 1 , and 5.01 as initial 
shifts. 

We compute a recycle space at every IRKA step. The results for the primary 
systems at a particular IRKA step (given in the caption) are given in Figures 16.21 - 
16.51 The graphs for the other IRKA steps are similar, as are the graphs for the dual 
systems. We carry out recycling for the smallest two shifts. Each figure has two solid 
curves for the linear systems solved without recycling and two dashed-dotted curves 
for those solved with recycling. It is evident that recycling significantly reduces the 
number of iterations. The savings in iterations are as high as 70% per system. As 
discussed in Section f5.3[ convergence for the remaining four (larger) shifts is rapid, so 
recycling Krylov subspaces is not useful for these. 

Next, we analyze the recycle space generated during the first two IRKA steps for 
the order 5177 rail model corresponding to the smallest shift. In Table l(T2l we give 
the cosines of principal angles between the recycle space and the invariant subspace 
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1357 primary systems with s = 40, k = 10, and drop tol = 0.100 



1 r 




10 20 30 40 50 60 70 80 

Number of iterations 



Fig. 6.2. Convergence of preconditioned RBiCG at the 3 rd IRKA step for the n = 1357 rail 
model, with s = 40, k = 10, and the preconditioner drop tolerance is 0.1. 

5177 primary systems with s = 50, k = 10, and drop tol = 0.050 

3r 



Shift 1: Recycling BiCG 

t Shift 1 : BiCG 




20 40 60 30 100 120 

Number of iterations 



Fig. 6.3. Convergence of preconditioned RBiCG at the 2 IRKA step for the n = 5177 rail 
model, with s = 50, k = 10, and the preconditioner drop tolerance is 0.05. 



spanned by eight eigenvectors associated with the eigenvalues of smallest magnitude. 
As for the recycle space, the invariant subspace is computed for the preconditioned 
operator. For the primary system, we use the right invariant subspace. For the 
dual system, we use the left invariant subspace. As the recycle space improves, the 
principal angles tend to zero, and so the cosines tend to one. Consider the results for 
the primary system. At the first IRKA step and the end of the first cycle, we see that 
the recycle space captures a subspace of dimension four of the invariant subspace. 
The recycle space gets more accurate at the end of the second cycle and captures 
a subspace of dimension seven. For the second IRKA step, we have a new shift, 
and so the matrix changes. Therefore, at the start of the first cycle, we see a slight 
deterioration of the recycle space (almost negligible). This recycle space leads to the 
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Primary System 


Dual System 


IRKA Step 1 


IRKA Step 2 


IRKA Step 1 


IRKA Step 2 


(71 = 


1.000 x 10~ 5 


cti = 1.834 x 10~ 5 


(Tl = 


1.000 x 10~ 5 


<Ti = 1.834 x 10" 5 


End of 


End of 


Start of 


End of 


End of 


End of 


Start of 


End of 


Cycle 1 


Cycle 2 


Cycle 1 


Cycle 1 


Cycle 1 


Cycle 2 


Cycle 1 


Cycle 1 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


0.9997 


1.0000 


1.0000 


1.0000 


0.9987 


1.0000 


1.0000 


1.0000 


0.9765 


1.0000 


1.0000 


1.0000 


0.9321 


1.0000 


1.0000 


1.0000 


0.4936 


1.0000 


1.0000 


1.0000 


0.2257 


1.0000 


0.9998 


0.9999 


0.0844 


0.9995 


0.9997 


0.9998 


0.0260 


0.9997 


0.9996 


0.9997 


0.0231 


0.9945 


0.9945 


0.9989 


0.0072 


0.7813 


0.7799 


0.9932 


0.0068 


0.3439 


0.3423 


0.9876 



Table 6.2 



Convergence of the recycle space for the sequence of linear systems corresponding to the 5177 
rail model and the smallest shift, as measured by the cosines of the principal angles between the 
primary (dual) recycle space and the right (left) invariant subspace associated with the eight eigen- 
values of smallest magnitude. The third column corresponds to the dashed convergence curve in 
FiaurelK3\ 



20209 primary systems with s = 40, k = 20, and drop tol = 0.010 



4r 




_gl 1 1 1 1 1 1 1 1 1 1 

10 20 30 40 50 60 70 80 90 100 

Number of iterations 

Fig. 6.4. Convergence of preconditioned RBiCG at the 2 nd IRKA step for the n = 20209 rail 
model, with s = 40, k = 20, and the preconditioner drop tolerance is 0.01. 

dashed curve in Figure 1(5751 By the end of the first cycle (at the second IRKA step), 
all eight eigenvectors are captured. The results for the dual system recycle space are 
similar. 

For the second set of experiments, we reduce the models to r = 3 degrees of 
freedom, using as initial shifts, 1.00 x 10 _5 ,7.08 x 10~ 3 , and 5.01. We compute the 
recycle space at every fifth IRKA step. The results are given in Table 16.31 We im- 
plement recycling for the smallest shift only. The linear systems corresponding to the 
two (larger) shifts converge fast, so recycling Krylov subspaces is not useful for these. 
Total iteration count refers to the sum of iteration counts for solving linear systems 
over all shifts and all IRKA steps. Total time is the time in seconds required by IRKA 
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79841 primary systems with s = 50, k = 20, and drop tol = 0.005 



Shift 1 : Recycling BiCG 

Shift 1 : BiCG 

Shift 2: Recycling BiCG 




40 60 80 100 

Number of iterations 



Fig. 6.5. Convergence of preconditioned RBiCG at the 3 rd IRKA step for the n = 79841 rail 
model, with s = 50, k = 20, and the preconditioner drop tolerance is 0.005. 



to converge to the ideal shifts. This includes the time for all IRKA computations as 
well as all linear solves (BiCG or RBiCG, as the case may be). The table illustrates 
that computing the reduced model without recycling takes about 50% more time than 
with recycling. Obviously, the improvement for just the pairs of linear systems where 
recycling is actually used is substantially larger. 



Size 


s 


k 


Drop 
tol 


IRKA 
steps 


Total iteration count 




Total time (s) 






BiCG 


RBiCG 


Ratio 


BiCG 


RBiCG 


Ratio 


20209 


10 


20 


0.01 


31 


3032 


1434 


2.11 


73.82 


54.28 


1.36 


79841 


50 


20 


0.005 


44 


6324 


2547 


2.48 


742.83 


505.09 


1.47 



Table 6.3 

The total number of iterations and computation time over all IRKA iterations with BiCG 
and with RBiCG for the linear systems with the smallest shift. Total time includes the time for all 
computations to compute the reduced model. 



7. Conclusion. We focus on efficiently solving sequences of dual linear systems. 
For several classes of problems, such as the linear systems arising in interpolatory 
model reduction, or bilinear forms arising in Quantum Monte Carlo methods, the 
BiCG algorithm has advantages over methods like GMRES that would solve the 
primary and the dual system separately. For sequences of dual linear systems arising 
in such problems, it is advantageous to use Krylov subspace recycling for the BiCG 
algorithm, and for this purpose we propose the RBiCG algorithm. The derivation of 
RBiCG also provides the foundation for recycling variants of other popular bi-Lanczos 
based methods, like CCS, BiCGSTAB, QMR, and TFQMR [3]. 

We have demonstrated the usefulness of RBiCG for interpolatory model reduc- 
tion using IRKA, an application that may be an important niche for this solver. In 
addition, we have analyzed and demonstrated the effectiveness of RBiCG for non- 
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symmetric linear systems arising from convection-diffusion problems. This suggests 
that the RBiCG method may be useful in other areas where solving dual systems in 
a Petrov-Galerkin sense brings special advantages. 

In future work, we plan to extend the use of RBiCG to model reduction for MIMO 
dynamical systems in a tangential interpolation framework where the right-hand sides 
are not constant as in the SISO case. In addition, we will investigate the use of RBiCG 
for evaluating bilinear forms arising in QMC algorithms. Our current results for this 
look promising. 

Acknowledgments. We thank the anonymous reviewers for their careful and helpful 
suggestions, which greatly helped us to improve this paper. 
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